YTDB-604: Lazy RID-only iteration for MATCH traversal steps by sandrawar · Pull Request #863 · JetBrains/youtrackdb

sandrawar · 2026-03-25T14:50:13Z

PR Title:

YTDB-604: Lazy RID-only iteration for MATCH traversal steps

Motivation:

Async-profiler data from LDBC IC5 (128K+ traversals) showed that the MATCH engine loads every intermediate vertex from storage (loadEntity()) even when only the RID is needed for traversal to the next hop. This causes unnecessary disk I/O, deserialization (EntityImpl.deserializeProperties() — 1.45% CPU), and GC pressure from short-lived ResultInternal objects wrapping full entities.

Most intermediate MATCH steps only need the RID — properties are only read at the final projection (RETURN post.title). By deferring loadEntity() to first property access, we skip I/O entirely for vertices that are just traversal waypoints or get rejected by downstream WHERE filters.

The fix adds ridIterator() to VertexFromLinkBagIterable, which yields bare RecordId objects from the LinkBag without touching storage. MatchEdgeTraverser.toExecutionStream() uses this path for VertexFromLinkBagIterable results. ResultInternal's existing lazy loading handles the rest — getIdentity() returns the RID immediately, getProperty() triggers loadEntity() on first access.

Class and RID pre-filters are preserved (both operate on the RID, no I/O needed).
No behavioral change for non-MATCH consumers — iterator() still returns
loaded Vertex objects.

gemini-code-assist · 2026-03-25T14:50:38Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a crucial performance optimization for the database's MATCH engine. By implementing a lazy loading strategy for intermediate traversal steps, the system now avoids materializing full Result objects when only Record IDs are needed. This change drastically cuts down on disk I/O, CPU cycles spent on deserialization, and memory overhead from short-lived objects, leading to more efficient execution of complex graph traversals, especially in scenarios with large datasets.

Highlights

Performance Enhancement: Introduced lazy RID-only iteration for the MATCH engine's traversal steps, significantly reducing unnecessary disk I/O, deserialization, and garbage collection pressure by deferring entity loading.
New ridIterator(): Added a ridIterator() method to VertexFromLinkBagIterable that directly yields RecordId objects from the LinkBag, applying filters without loading full entities from storage.
MatchEdgeTraverser Integration: Updated MatchEdgeTraverser.toExecutionStream() to leverage the new ridIterator() for VertexFromLinkBagIterable, ensuring that intermediate MATCH steps benefit from the lazy loading mechanism.
Non-Disruptive Change: Confirmed that existing class and RID pre-filters remain effective and I/O-free, and that the change does not affect non-MATCH consumers, who continue to receive fully loaded Vertex objects.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant optimization for the MATCH query engine by implementing a "RID-only path" for VertexFromLinkBagIterable. A new ridIterator() method is added to VertexFromLinkBagIterable which allows iterating over RecordId objects directly from a LinkBag without loading the full entities, applying class and RID filters. The MatchEdgeTraverser.toExecutionStream method is updated to utilize this new iterator, enabling lazy loading of entities in MATCH traversals. Comprehensive unit tests have been added to validate the functionality and lazy-loading behavior of the new ridIterator() and its integration with the MATCH execution stream. I have no feedback to provide as there were no review comments.

github-actions · 2026-03-25T15:39:35Z

Test Count Gate Results

Tolerance: 5% drop allowed per module

Overall: ✅ 18196 tests (baseline: 18184, +12)

Module	Baseline	Current	Change	Status
`core`	7639	7651	+12	✅
`docker-tests`	1891	1891	+0	✅
`embedded`	1931	1931	+0	✅
`examples`	3	3	+0	✅
`gremlin-annotations`	30	30	+0	✅
`jmh-ldbc`	39	39	+0	✅
`server`	5504	5504	+0	✅
`tests`	1147	1147	+0	✅

github-actions · 2026-03-25T15:39:35Z

Coverage Gate Results

Thresholds: 85% line, 70% branch

Line Coverage: ✅ 97.2% (35/36 lines)

File	Coverage	Uncovered Lines
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/record/impl/VertexFromLinkBagIterable.java`	✅ 100.0% (23/23)	-
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/ResultInternal.java`	✅ 90.9% (10/11)	482
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchEdgeTraverser.java`	✅ 100.0% (2/2)	-

Branch Coverage: ✅ 91.7% (22/24 branches)

File	Coverage	Lines with Uncovered Branches
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/record/impl/VertexFromLinkBagIterable.java`	✅ 100.0% (16/16)	-
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/ResultInternal.java`	✅ 75.0% (6/8)	477, 484

github-actions · 2026-04-03T00:03:30Z

JMH LDBC Benchmark Comparison

Base: 137673eae6 (fork-point with develop) | Head: 9936243632
Throughput: 🔴 2 regression(s), 🟢 6 improvement(s) (>±5% threshold, non-overlapping error bars, <10% relative error)
Scalability: 🔴 1 scaling regression(s), 🟢 2 scaling improvement(s) (>±5% threshold, non-overlapping error bars, <10% relative error)

Single-Thread Results

Benchmark	Base ops/s	Base err	Head ops/s	Head err	Δ%
ic10_friendRecommendation	0.120	±2.8%	0.112	±3.8%	-6.5% 🔴
ic11_jobReferral	35.4	±2.2%	34.7	±3.1%	-2.1%
ic12_expertSearch	21.2	±1.5%	21.3	±0.8%	+0.3%
ic13_shortestPath	3,720	±6.4%	3,628	±6.7%	-2.5%
ic1_transitiveFriends	33.6	±1.1%	40.6	±3.6%	+20.8% 🟢
ic2_recentFriendMessages	190.0	±3.6%	182.6	±3.3%	-3.9%
ic3_friendsInCountries	0.139	±1.3%	0.135	±2.1%	-3.4%
ic4_newTopics	3.6	±7.2%	3.3	±1.7%	-8.0%
ic5_newGroups	0.082	±28.3%	0.082	±28.2%	-0.9%
ic6_tagCoOccurrence	3.4	±3.9%	3.4	±3.9%	-1.9%
ic7_recentLikers	54.6	±6.5%	91.4	±3.8%	+67.4% 🟢
ic8_recentReplies	823.7	±1.3%	804.7	±1.7%	-2.3%
ic9_recentFofMessages	1.2	±1.3%	1.2	±1.4%	-0.7%
is1_personProfile	42,180	±1.6%	40,744	±2.8%	-3.4%
is2_personPosts	489.3	±1.6%	483.0	±1.6%	-1.3%
is3_personFriends	12,050	±4.1%	11,955	±3.9%	-0.8%
is4_messageContent	60,369	±2.5%	60,196	±1.8%	-0.3%
is5_messageCreator	54,456	±1.7%	54,368	±2.0%	-0.2%
is6_messageForum	36,194	±2.5%	36,017	±1.2%	-0.5%
is7_messageReplies	2,799	±1.5%	5,048	±1.0%	+80.4% 🟢

Multi-Thread Results

Benchmark	Base ops/s	Base err	Head ops/s	Head err	Δ%
ic10_friendRecommendation	0.579	±2.1%	0.583	±1.1%	+0.7%
ic11_jobReferral	162.4	±2.3%	161.3	±2.1%	-0.7%
ic12_expertSearch	97.8	±2.0%	97.6	±0.9%	-0.3%
ic13_shortestPath	19,842	±5.5%	19,748	±4.9%	-0.5%
ic1_transitiveFriends	175.7	±1.6%	213.6	±2.2%	+21.6% 🟢
ic2_recentFriendMessages	1,000	±2.3%	963.4	±1.2%	-3.7%
ic3_friendsInCountries	0.646	±1.4%	0.658	±1.0%	+1.9%
ic4_newTopics	14.8	±2.1%	14.0	±3.1%	-5.3% 🔴
ic5_newGroups	0.316	±7.7%	0.316	±7.3%	-0.1%
ic6_tagCoOccurrence	16.7	±1.4%	16.3	±1.9%	-2.2%
ic7_recentLikers	263.2	±2.7%	398.4	±1.9%	+51.4% 🟢
ic8_recentReplies	4,176	±1.2%	4,104	±0.7%	-1.7%
ic9_recentFofMessages	6.6	±4.2%	6.3	±3.7%	-4.3%
is1_personProfile	177,824	±2.5%	174,485	±1.5%	-1.9%
is2_personPosts	2,329	±0.7%	2,291	±0.6%	-1.6%
is3_personFriends	58,699	±2.2%	59,286	±1.6%	+1.0%
is4_messageContent	241,305	±2.0%	244,511	±1.1%	+1.3%
is5_messageCreator	218,281	±1.8%	216,569	±1.4%	-0.8%
is6_messageForum	144,191	±1.7%	144,704	±1.7%	+0.4%
is7_messageReplies	14,177	±1.3%	23,473	±1.2%	+65.6% 🟢

Scalability (MT/ST ratio)

Benchmark	Base ratio	Base err	Head ratio	Head err	Δ%
ic10_friendRecommendation	4.82x	±3.5%	5.19x	±4.0%	+7.8% 🟢
ic11_jobReferral	4.59x	±3.2%	4.65x	±3.7%	+1.4%
ic12_expertSearch	4.61x	±2.5%	4.58x	±1.2%	-0.5%
ic13_shortestPath	5.33x	±8.4%	5.44x	±8.3%	+2.0%
ic1_transitiveFriends	5.23x	±1.9%	5.26x	±4.2%	+0.7%
ic2_recentFriendMessages	5.27x	±4.2%	5.28x	±3.5%	+0.2%
ic3_friendsInCountries	4.64x	±1.9%	4.89x	±2.3%	+5.4% 🟢
ic4_newTopics	4.10x	±7.5%	4.22x	±3.6%	+2.9%
ic5_newGroups	3.84x	±29.3%	3.87x	±29.2%	+0.7%
ic6_tagCoOccurrence	4.84x	±4.2%	4.82x	±4.4%	-0.3%
ic7_recentLikers	4.82x	±7.0%	4.36x	±4.2%	-9.6%
ic8_recentReplies	5.07x	±1.8%	5.10x	±1.8%	+0.6%
ic9_recentFofMessages	5.47x	±4.4%	5.27x	±3.9%	-3.6%
is1_personProfile	4.22x	±2.9%	4.28x	±3.2%	+1.6%
is2_personPosts	4.76x	±1.8%	4.74x	±1.7%	-0.3%
is3_personFriends	4.87x	±4.6%	4.96x	±4.2%	+1.8%
is4_messageContent	4.00x	±3.2%	4.06x	±2.1%	+1.6%
is5_messageCreator	4.01x	±2.4%	3.98x	±2.4%	-0.6%
is6_messageForum	3.98x	±3.0%	4.02x	±2.1%	+0.9%
is7_messageReplies	5.07x	±2.0%	4.65x	±1.5%	-8.2% 🔴

andrii0lomakin · 2026-04-03T02:34:58Z

Hi @sandrawar, please profile regressions using asyncprofiler on Hetzner CCX 33 node and find out what caused regressions.

…ate steps

…ding Eliminate redundant isBlob() schema snapshot lookup in the lazy loading path. Before: getProperty() called isEntity() → isBlob() [lookup #1], then asEntity() called isEntity() → isBlob() [lookup #2] → loadEntity(). After: getProperty() does instanceof check → isBlob() [single lookup] → loadEntity() directly. Halves the per-entity overhead for the lazy RID path introduced by ridIterator(), fixing IC4 NOT-pattern regression where thousands of post entities are loaded via lazy path per query.

sandrawar · 2026-04-11T12:08:34Z

Hi @sandrawar, please profile regressions using asyncprofiler on Hetzner CCX 33 node and find out what caused regressions.

Profiling Summary

All 3 flagged regressions are measurement noise — no real performance
degradation.

Triage run on isolated Hetzner CCX33 (1 fork, 3 iterations, tier-appropriate
warmup/measurement):

Benchmark	CI Δ	Triage Δ	HEAD ops/s	BASE ops/s	CI Overlap	Verdict
IC10 ST	-6.5%	-2.1%	0.143 ± 0.057	0.146 ± 0.064	Massive	Noise
IC4 MT	-5.3%	-3.2%	15.825 ± 3.057	16.355 ± 1.376	Significant	Noise
IS7 scaling	-8.2%	—	—	—	—	Not a regression

IC10 ST: Ultra-slow (~0.14 ops/s). CI's -6.5% is within natural
variance; triage shows -2.1% with nearly identical confidence intervals.
IC4 MT: Borderline -5.3% in CI drops to -3.2% on isolated server with
wide overlapping CIs.
IS7 scalability: Throughput improved +80% ST / +66% MT. The ratio drop
is a side effect of ST benefiting more from lazy RID path — not a degradation.

sandrawar changed the title ~~YTDB-604: MATCH engine materializes full Result objects for intermedi…~~ YTDB-604: Lazy RID-only iteration for MATCH traversal steps Mar 25, 2026

gemini-code-assist bot reviewed Mar 25, 2026

View reviewed changes

sandrawar requested review from andrii0lomakin March 25, 2026 15:16

sandrawar force-pushed the lazy-result-match-traversal branch 3 times, most recently from 3039c3f to ce57cae Compare April 2, 2026 06:54

sandrawar force-pushed the lazy-result-match-traversal branch 4 times, most recently from dcdd241 to 96a8c1f Compare April 8, 2026 16:19

sandrawar added 3 commits April 9, 2026 08:24

YTDB-604: MATCH engine materializes full Result objects for intermedi…

090c350

…ate steps

YTDB-604: add missing VertexFromLinkBagIterable import

c3fa1fd

sandrawar force-pushed the lazy-result-match-traversal branch from 96a8c1f to b327547 Compare April 9, 2026 06:25

YTDB-604: regressions fix

9936243

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YTDB-604: Lazy RID-only iteration for MATCH traversal steps#863

YTDB-604: Lazy RID-only iteration for MATCH traversal steps#863
sandrawar wants to merge 4 commits intodevelopfrom
lazy-result-match-traversal

sandrawar commented Mar 25, 2026

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

andrii0lomakin commented Apr 3, 2026

Uh oh!

sandrawar commented Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sandrawar commented Mar 25, 2026

PR Title:

Motivation:

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Count Gate Results

Overall: ✅ 18196 tests (baseline: 18184, +12)

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Gate Results

Line Coverage: ✅ 97.2% (35/36 lines)

Branch Coverage: ✅ 91.7% (22/24 branches)

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JMH LDBC Benchmark Comparison

Single-Thread Results

Multi-Thread Results

Scalability (MT/ST ratio)

Uh oh!

andrii0lomakin commented Apr 3, 2026

Uh oh!

sandrawar commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Profiling Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading

sandrawar commented Apr 11, 2026 •

edited

Loading